NetNews Offline 2

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Offline 2 / NetNews Offline Volume 2.iso / news / comp / lang / c-part1 / 7664 < prev next >

Wrap

Internet Message Format | 1996-08-05 | 2.5 KB

Path: keats.ugrad.cs.ubc.ca!not-for-mail From: c2a192@ugrad.cs.ubc.ca (Kazimir Kylheku) Newsgroups: comp.lang.c Subject: Re: unique id for a string Date: 26 Feb 1996 06:46:09 -0800 Organization: Computer Science, University of B.C., Vancouver, B.C., Canada Message-ID: <4gsh3hINNf3b@keats.ugrad.cs.ubc.ca> References: <1996Feb16.175601.114182@kuhub.cc.ukans.edu> <3129dd39.961626@news.iquest.net> <4gfpveINNe68@keats.ugrad.cs.ubc.ca> <4gih8a$b62@madeline.INS.CWRU.Edu> NNTP-Posting-Host: keats.ugrad.cs.ubc.ca In article <4gih8a$b62@madeline.INS.CWRU.Edu>, Michael A. Balfour <mab22@po.CWRU.Edu> wrote: > >In a previous article, c2a192@ugrad.cs.ubc.ca (Kazimir Kylheku) says: > >>In article <3129dd39.961626@news.iquest.net>, >>Robert B. Clark <rclark@iquest.net> wrote: >> >On 16 Feb 96 17:56:01 CST, anh@kuhub.cc.ukans.edu wrote: >> > >> >>HELLO = H * 26^4 + E * 26 ^3 + L * 26^2 + L * 26^1 + E * 26^0 >> >> >> >>But the id is way too big. I need something that fits within a 32-bits int >> >>or a 64-bits long. >> > >> >You're thinking way too hard, perhaps. Look up the algorithm for CRCs >> >(cyclic redundancy checks/codes) in Snippets or most any other popular >> >programmer's reference. >> >>But he wants a _unique_ ID. This is not possible, since the space of possible >>words is much larger than the space of 32- or even 64-bit integers. Even for a >>limited dictionary of around 200,000 words, a function that _guarantees_ a >>unique hash for each indentifier into a 32-bit space is difficult to find. > >Actually, it's not *that* difficult to find. Try running gperf from GNU >or reading the paper cited in my previous post in this subject. The >results in the paper claim to be able to find a perfect hash function >for 262,144 18-character words in 16.087 seconds on a Sun Sparc 2. >Doesn't sound that difficult to me. :) > >>One way to get a unique mapping is to simply assign increasing numbers to the >>200,000+ words, and use a hashing technique to quickly look up a word given its >>integer tag, and look up the tag given the word. With the proper >>represenatation, both operations can be done in "constant time". > >Once again, using a perfect minimal hash function will provide an easy >method for accomplishing this. Definitely. But that assumes that the dictionary is not open to new members. Otherwise you have to recompute the hashing function each time you add a member. Does the paper discuss such ``loose ends''? I am probably going to have a look at it, since perfect hashing functions are interesting things. :) --